225 research outputs found
Robust Group Linkage
We study the problem of group linkage: linking records that refer to entities
in the same group. Applications for group linkage include finding businesses in
the same chain, finding conference attendees from the same affiliation, finding
players from the same team, etc. Group linkage faces challenges not present for
traditional record linkage. First, although different members in the same group
can share some similar global values of an attribute, they represent different
entities so can also have distinct local values for the same or different
attributes, requiring a high tolerance for value diversity. Second, groups can
be huge (with tens of thousands of records), requiring high scalability even
after using good blocking strategies.
We present a two-stage algorithm: the first stage identifies cores containing
records that are very likely to belong to the same group, while being robust to
possible erroneous values; the second stage collects strong evidence from the
cores and leverages it for merging more records into the same group, while
being tolerant to differences in local values of an attribute. Experimental
results show the high effectiveness and efficiency of our algorithm on various
real-world data sets
Vehicular Fog Computing Enabled Real-time Collision Warning via Trajectory Calibration
Vehicular fog computing (VFC) has been envisioned as a promising paradigm for
enabling a variety of emerging intelligent transportation systems (ITS).
However, due to inevitable as well as non-negligible issues in wireless
communication, including transmission latency and packet loss, it is still
challenging in implementing safety-critical applications, such as real-time
collision warning in vehicular networks. In this paper, we present a vehicular
fog computing architecture, aiming at supporting effective and real-time
collision warning by offloading computation and communication overheads to
distributed fog nodes. With the system architecture, we further propose a
trajectory calibration based collision warning (TCCW) algorithm along with
tailored communication protocols. Specifically, an application-layer
vehicular-to-infrastructure (V2I) communication delay is fitted by the Stable
distribution with real-world field testing data. Then, a packet loss detection
mechanism is designed. Finally, TCCW calibrates real-time vehicle trajectories
based on received vehicle status including GPS coordinates, velocity,
acceleration, heading direction, as well as the estimation of communication
delay and the detection of packet loss. For performance evaluation, we build
the simulation model and implement conventional solutions including cloud-based
warning and fog-based warning without calibration for comparison. Real-vehicle
trajectories are extracted as the input, and the simulation results demonstrate
that the effectiveness of TCCW in terms of the highest precision and recall in
a wide range of scenarios
Stability and bifurcation analysis of Westwood+ TCP congestion control model in mobile cloud computing networks
In this paper, we first build up a Westwood+ TCP congestion control model with communication delay in mobile cloud computing networks. We then study the dynamics of this model by analyzing the distribution ranges of eigenvalues of its characteristic equation. Taking communication delay as the bifurcation parameter, we derive the linear stability criteria depending on communication delay. Furthermore, we study the direction of Hopf bifurcation as well as the stability of periodic solution for the Westwood+ TCP congestion control model with communication delay. We find that the Hopf bifurcation occurs when the communication delay passes a sequence of critical values. The stability and direction of the Hopf bifurcation are determined by the normal form theory and the center manifold theorem. Finally, numerical simulation is done to verify the theoretical results
Erasing-based lossless compression method for streaming floating-point time series
There are a prohibitively large number of floating-point time series data
generated at an unprecedentedly high rate. An efficient, compact and lossless
compression for time series data is of great importance for a wide range of
scenarios. Most existing lossless floating-point compression methods are based
on the XOR operation, but they do not fully exploit the trailing zeros, which
usually results in an unsatisfactory compression ratio. This paper proposes an
Erasing-based Lossless Floating-point compression algorithm, i.e., Elf. The
main idea of Elf is to erase the last few bits (i.e., set them to zero) of
floating-point values, so the XORed values are supposed to contain many
trailing zeros. The challenges of the erasing-based method are three-fold.
First, how to quickly determine the erased bits? Second, how to losslessly
recover the original data from the erased ones? Third, how to compactly encode
the erased data? Through rigorous mathematical analysis, Elf can directly
determine the erased bits and restore the original values without losing any
precision. To further improve the compression ratio, we propose a novel
encoding strategy for the XORed values with many trailing zeros. Furthermore,
observing the values in a time series usually have similar significand counts,
we propose an upgraded version of Elf named Elf+ by optimizing the significand
count encoding strategy, which improves the compression ratio and reduces the
running time further. Both Elf and Elf+ work in a streaming fashion. They take
only O(N) (where N is the length of a time series) in time and O(1) in space,
and achieve a notable compression ratio with a theoretical guarantee. Extensive
experiments using 22 datasets show the powerful performance of Elf and Elf+
compared with 9 advanced competitors for both double-precision and
single-precision floating-point values
Visible Red and Infrared Light Alters Gene Expression in Human Marrow Stromal Fibroblast Cells
Objectives This study tested whether or not gene expression in human marrow stromal fibroblast (MSF) cells depends on light wavelength and energy density. Material and Methods Primary cultures of isolated human bone marrow stem cells (hBMSC) were exposed to visible red (VR, 633 nm) and infrared (IR, 830) radiation wavelengths from a light emitting diode (LED) over a range of energy densities (0.5, 1.0, 1.5, 2.0 Joules/cm2) Cultured cells were assayed for cell proliferation, osteogenic potential, adipogenesis, mRNA and protein content. mRNA was analyzed by microarray, and compared among different wavelengths and energy densities. Mesenchymal and epithelial cell responses were compared to determine whether responses were cell-type specific. Protein array analysis was used to further analyze key pathways identified by microarrays. Result Different wavelengths and energy densities produced unique sets of genes identified by microarray analysis. Pathway analysis pointed to TGF beta 1 in the visible red and Akt 1 in the infrared wavelengths as key pathways to study. TGF beta protein arrays suggested switching from canonical to non-canonical TGF beta pathways with increases to longer IR wavelengths. Microarrays suggest RANKL and TIMP 10 followed IR energy density dose response curves. Epithelial and mesenchymal cells respond differently to stimulation by light suggesting cell-type specific response is possible. Conclusions These studies demonstrate differential gene expression with different wavelengths, energy densities and cell types. These differences in gene expression have the potential to be exploited for therapeutic purposes and can help explain contradictory results in the literature when wavelengths, energy densities and cell types differ
Assessing and Enhancing Robustness of Deep Learning Models with Corruption Emulation in Digital Pathology
Deep learning in digital pathology brings intelligence and automation as
substantial enhancements to pathological analysis, the gold standard of
clinical diagnosis. However, multiple steps from tissue preparation to slide
imaging introduce various image corruptions, making it difficult for deep
neural network (DNN) models to achieve stable diagnostic results for clinical
use. In order to assess and further enhance the robustness of the models, we
analyze the physical causes of the full-stack corruptions throughout the
pathological life-cycle and propose an Omni-Corruption Emulation (OmniCE)
method to reproduce 21 types of corruptions quantified with 5-level severity.
We then construct three OmniCE-corrupted benchmark datasets at both patch level
and slide level and assess the robustness of popular DNNs in classification and
segmentation tasks. Further, we explore to use the OmniCE-corrupted datasets as
augmentation data for training and experiments to verify that the
generalization ability of the models has been significantly enhanced
- …